Custom Report
1 Intoduction
This report is one of 493,845 that I will make, and one of 104,070,413 that could be made.
I “toke” the 1.4 TB Linked-In data that was breached in 2020, and turned it into some insights to power my job HUNT.
The insights I could share in this report, that are also related to my goals, are:
- Industry base recruitment trend.
- Company base workforce timeline.
- Current/part workforce info:
- Basic info: Name, job title, status, social link. I could add geo-location for some that have the data, but it would look creepy.
- Their work period.
- Their experiences.
2 About me
Salutations; I’m Joseph, a self-taught data analyst, engineer, and scraper.
Despite life’s challenges, my goal remains a remote job, either full or part-time, and having friends to tackle the challenges of this changing world with.
To show my skills and dedication, I made this project that yielded this tailored report.
3 About the project
3.1 How this project comes to life?
You would know by now, from my email, that I am hunting for a job.
About a year ago, I scraped contact info from Google Map to get my first job. Later I scraped contact from Linked-In website… you can check how that went in here.
Recently, I finally got to learning SQL because of DuckDB, it is a software that allows you to process big data in your local machine by using storage space as RAM; Then I remembered about a leaked Linked-In data that I couldn’t process.
Thus my journey started to learn SQL, process the data, and make something out of it.
3.2 The process
The process was done in my local machine, and it was as followed.
3.2.1 Downloaded the leaked data
I downloaded the data from a torrent.
There was around 700 .gz file, each is around 280 Mb; 196 GB in total.
Each .gz file contain a 2 GB file; 1.4 TB in total.
Each file have multiple lines, and each one of them is a JSON; Not the file is a JSON, it just have multiple JSONs, one in each line.
3.2.2 Processing the weird data
I this phase I created a script that automatically open an archive, process the file, and save it as a Parquet file with compression level of 22.
I used Python, Pathlib, Polars, and a lot of patience.
The process toke around 20 minutes per file, in total it toke around three weeks (I had to shutdown my PC at night) The result was 700 parquet files, each is around 190 Mb; 133 GB in total.
3.2.3 Making relational database
The data in the datasets were nested, especially the “experience” field, it had the experience of a person and the company info; The problem is that the company info get repeated multiple tiles, across all datasets.
Making a relational database will solve this, and make the exploratory data analysis easier.
The code was split in two:
1. I used Polars to split each of the 700 datasets into mini relational databases.
2. I used DuckDB to merge all the mini relational databases and remove duplicates in some, mainly company and university information’s.
The result was a relational database that is 73 GB in size; From 1.4 TB to 73 GB.
All of this is using my PC, so servers were harmed, only my CPU fan and my ear.
3.2.4 Filter
I filtered out companies base on their industry, country, and whether I have the email of one of the higher ups.
4 General graphs
4.1 market research indestry’s yearly new recruit count
4.2 aba market research ltd.’s workforce status over the years
5 Workforce sample
5.1 Annie Young
Job title: Senior research executive
Associated: True
Socials: https://linkedin.com/in/annie-young-7aa0a253
5.1.1 Annie Young’s working period at aba market research ltd.
5.1.2 Gantt plot of Annie Young’s experience
5.2 Bev Ferey
Job title: ****
Associated: True
Socials: https://linkedin.com/in/bev-ferey-73299388
5.2.1 Bev Ferey’s working period at aba market research ltd.
5.2.2 Gantt plot of Bev Ferey’s experience
5.3 Charley Jackson
Job title: Research manager
Associated: True
Socials: https://linkedin.com/in/charley-jackson-92596938 | https://facebook.com/charleyljackson
5.3.1 Charley Jackson’s working period at aba market research ltd.
5.3.2 Gantt plot of Charley Jackson’s experience
5.4 Chelsea Gray
Job title: Junior graphic designer
Associated: True
Socials: https://twitter.com/cgraydesigns | https://linkedin.com/in/chelsea-gray
5.4.1 Chelsea Gray’s working period at aba market research ltd.
5.4.2 Gantt plot of Chelsea Gray’s experience
5.5 Cheryl Anderson
Job title: Associate director
Associated: False
Socials: https://linkedin.com/in/cheryl-anderson-37479a16 | https://facebook.com/cheryl.anderson.58152 | https://linkedin.com/in/dr-cheryl-anderson-37479a16
5.5.1 Cheryl Anderson’s working period at aba market research ltd.
5.5.2 Gantt plot of Cheryl Anderson’s experience
5.6 Daisy Turner
Job title: Marketing manager
Associated: True
Socials: https://linkedin.com/in/daisy-turner-2084603a
5.6.1 Daisy Turner’s working period at aba market research ltd.
5.6.2 Gantt plot of Daisy Turner’s experience
5.7 Daisy-May Parsons
Job title: Market research
Associated: False
Socials: https://facebook.com/daisymay.parsons | https://linkedin.com/in/daisymayparsons
5.7.1 Daisy-May Parsons’s working period at aba market research ltd.
5.7.2 Gantt plot of Daisy-May Parsons’s experience
5.8 Danielle Corwin
Job title: Senior research executive
Associated: True
Socials: https://linkedin.com/in/danielle-corwin-05b4451a
5.8.1 Danielle Corwin’s working period at aba market research ltd.
5.8.2 Gantt plot of Danielle Corwin’s experience
5.9 Dean Murley
Job title: Associate director
Associated: False
Socials: https://linkedin.com/in/deanmurley | https://twitter.com/tweetingtiger
5.9.1 Dean Murley’s working period at aba market research ltd.
5.9.2 Gantt plot of Dean Murley’s experience
5.10 Emily Spillett
Job title: Associate director
Associated: True
Socials: https://linkedin.com/in/emily-spillett-9b081036
5.10.1 Emily Spillett’s working period at aba market research ltd.
5.10.2 Gantt plot of Emily Spillett’s experience
5.11 Fenna Maynard
Job title: Intern
Associated: False
Socials: https://linkedin.com/in/fenna-maynard-09079baa
5.11.1 Fenna Maynard’s working period at aba market research ltd.
5.11.2 Gantt plot of Fenna Maynard’s experience
5.12 Jade Annan
Job title: Research manager
Associated: True
Socials: https://linkedin.com/in/jade-annan-636aa838
5.12.1 Jade Annan’s working period at aba market research ltd.
5.12.2 Gantt plot of Jade Annan’s experience
5.13 Joanna Jones
Job title: Project manager
Associated: False
Socials: https://linkedin.com/in/joannajoneslondon
5.13.1 Joanna Jones’s working period at aba market research ltd.
5.13.2 Gantt plot of Joanna Jones’s experience
5.14 Joanna Rogers
Job title: Senior research executive
Associated: True
Socials: https://linkedin.com/in/joanna-crane-7a974947
5.14.1 Joanna Rogers’s working period at aba market research ltd.
5.14.2 Gantt plot of Joanna Rogers’s experience
5.15 Karen Heapy
Job title: Head of facilities
Associated: True
Socials: https://linkedin.com/in/karen-heapy-9a372623
5.15.1 Karen Heapy’s working period at aba market research ltd.
5.15.2 Gantt plot of Karen Heapy’s experience
5.16 Kateryna Butko
Job title: ****
Associated: True
Socials: https://linkedin.com/in/kateryna-butko-130a144b
5.16.1 Kateryna Butko’s working period at aba market research ltd.
5.16.2 Gantt plot of Kateryna Butko’s experience
5.17 Laura Kelly
Job title: Senior proposals manager
Associated: True
Socials: https://linkedin.com/in/laura-kelly-89096a8a | https://linkedin.com/in/laura-kelly
5.17.1 Laura Kelly’s working period at aba market research ltd.
5.17.2 Gantt plot of Laura Kelly’s experience
5.18 Laura Robinson
Job title: Research director
Associated: True
Socials: https://twitter.com/abaresearch | https://linkedin.com/in/laura-robinson-63184340 | https://facebook.com/625744
5.18.1 Laura Robinson’s working period at aba market research ltd.
5.18.2 Gantt plot of Laura Robinson’s experience
5.19 Lucy Assaker
Job title: Human resources administrator
Associated: False
Socials: https://linkedin.com/in/lucy-assaker-4bb47683
5.19.1 Lucy Assaker’s working period at aba market research ltd.
5.19.2 Gantt plot of Lucy Assaker’s experience
5.20 Matthew Goddard
Job title: Research executive
Associated: True
Socials: https://linkedin.com/in/matthewgoddard25
5.20.1 Matthew Goddard’s working period at aba market research ltd.
5.20.2 Gantt plot of Matthew Goddard’s experience
5.21 Nick Bonney
Job title: Managing director
Associated: False
Socials: https://linkedin.com/in/nick-bonney-5165b2 | https://linkedin.com/in/nickbonney | https://twitter.com/tweetdeepblue
5.21.1 Nick Bonney’s working period at aba market research ltd.
5.21.2 Gantt plot of Nick Bonney’s experience
5.22 Simone Thomson
Job title: Im consultant
Associated: False
Socials: https://linkedin.com/in/simone-thomson-6632138
5.22.1 Simone Thomson’s working period at aba market research ltd.
5.22.2 Gantt plot of Simone Thomson’s experience
5.23 Sophie Graystone
Job title: Head of data services
Associated: True
Socials: https://linkedin.com/in/sophie-graystone-6a96b112
5.23.1 Sophie Graystone’s working period at aba market research ltd.
5.23.2 Gantt plot of Sophie Graystone’s experience
5.24 Steph Barker
Job title: Assistant accountant
Associated: True
Socials: https://linkedin.com/in/stephanie-barker-93850617
5.24.1 Steph Barker’s working period at aba market research ltd.
5.24.2 Gantt plot of Steph Barker’s experience
5.25 Tanzia Nasrin
Job title: Comment coder
Associated: True
Socials: https://linkedin.com/in/tanzia-nasrin-4b4a4876
5.25.1 Tanzia Nasrin’s working period at aba market research ltd.
5.25.2 Gantt plot of Tanzia Nasrin’s experience
5.26 Vix Dimmock
Job title: Fieldwork manager
Associated: False
Socials: https://linkedin.com/in/boughtobeauty
5.26.1 Vix Dimmock’s working period at aba market research ltd.
5.26.2 Gantt plot of Vix Dimmock’s experience
5.27 Will Norris
Job title: Content creator
Associated: False
Socials: https://linkedin.com/in/will-norris-73480b100 | https://linkedin.com/in/william-norris-73480b100 | https://linkedin.com/in/will-norris